[00:00.000 --> 00:06.740]  Hello, everybody. It's great to be here at Cloud Village. And in the next 30 minutes or so,
[00:06.740 --> 00:10.740]  we're going to talk about static analysis of infrastructure as code.
[00:11.300 --> 00:18.320]  Just as a side note, if you're interested in contributing to any of those open source
[00:18.320 --> 00:26.080]  projects around AWS security, IAM security, infrastructure as code, or Google Cloud,
[00:26.080 --> 00:32.220]  feel free to reach out. It's Barak Shoster at Twitter, or just follow my activity at GitHub.
[00:32.220 --> 00:37.420]  I really like to work on open source projects, and it is one of the things that is great about
[00:37.420 --> 00:43.280]  BridgeCode that is contributing more and more open source security tooling back to the community.
[00:44.860 --> 00:51.100]  All right. So on the agenda today, we're going to talk about the state of open source
[00:51.100 --> 00:57.840]  Terraform modules. Are they secured or not? We'll go over how to run Chekhov
[00:58.460 --> 01:03.220]  using two different integrations, pre-commit hook and CI deployment.
[01:04.540 --> 01:11.320]  And at the end of this talk, you should be able to just download Chekhov and run it on your own
[01:11.320 --> 01:19.760]  manifests. Before diving into deployment, let's talk about the problem space. As engineers,
[01:19.760 --> 01:29.800]  like myself, and probably you who are listening, our goal is to move faster and in much agile way.
[01:31.340 --> 01:38.600]  And when we develop a new product, we need some way to deploy the infrastructure that
[01:38.600 --> 01:46.420]  this product runs on. Infrastructure as code came to the world to help us just
[01:46.420 --> 01:53.520]  do all this manual work at the cloud providers like the AWS, Azure, GCP consoles of setting up
[01:53.820 --> 02:03.640]  a VM automatically and having this job of setting those instances reproducible, version controlled,
[02:04.160 --> 02:10.280]  and peer reviewed. Since for the first time, it's in code, you can peer review it as a pull request
[02:10.280 --> 02:18.240]  or merge request in your VCS. When you are working fast, you're producing probably more code,
[02:18.240 --> 02:24.280]  and with more code comes more bugs. So in a lot of times, when you want to just provision
[02:24.920 --> 02:31.200]  a new cloud resource, let's take S3, for example, you might miss some of the default configurations
[02:31.200 --> 02:37.640]  and keep the S3 unencrypted by default or without versioning or without access logs.
[02:37.640 --> 02:43.220]  And the same goes for other resources like compute resources. You might forget access
[02:43.220 --> 02:50.080]  keys in your plain text environment variables instead of using a secured store. And when
[02:50.080 --> 02:57.720]  you're copying pre-made templates from the internet that are unvetted, those issues might repeat
[02:57.720 --> 03:04.620]  more and more. So Gartner said, and this is often quoted, that 95 of the cloud security issues
[03:04.620 --> 03:13.040]  are related to configurations errors. And the trend that is being done at the past two or three
[03:13.040 --> 03:18.640]  years, I know that Terraform exists for a little more than seven years, is that we're codifying
[03:18.640 --> 03:25.530]  each and every activity that we're doing in YAML files, HCL files, etc.
[03:28.570 --> 03:35.770]  These tooling, those manifests that we write, the infrastructure as code, are about provisioning
[03:35.770 --> 03:42.070]  new resources in a reproducible way and about managing them. But it lacks the security
[03:42.070 --> 03:47.030]  functionality or best practices vetting functionality. And there are some great
[03:47.030 --> 03:55.070]  tools around this area. There is OPA, the open policy agent of the CNCF, and also Chekhov,
[03:55.070 --> 04:00.910]  TFSEC, CFNAG. Each of those is tackling the problem from a different angle.
[04:01.590 --> 04:07.470]  So let's talk about the problem that we're trying to solve using Chekhov, which is open source and
[04:07.470 --> 04:16.130]  under Apache 2 license. What we try to understand is the default configuration of any kind
[04:16.890 --> 04:22.630]  of infrastructure as code manifest, meaning Kubernetes, Terraform, CloudFormation,
[04:22.630 --> 04:33.010]  serverless framework, etc. is secured by default. So we asked ourselves that question.
[04:33.010 --> 04:41.870]  Let's scan as first open source modules of those cloud manifests.
[04:42.010 --> 04:48.990]  So we decided to use Chekhov to scan thousands of open source repositories.
[04:49.770 --> 04:55.010]  Before diving into the results, there are some interesting stats about the trend of
[04:55.010 --> 05:00.930]  contributing and creating new infrastructure modules. And this is focused only on Terraform
[05:00.930 --> 05:12.650]  modules. So back in 2017, 18, and 19, we had a mild growth in the amount of new open source modules
[05:12.650 --> 05:20.570]  contributed around Terraform. And specifically at the COVID era, around February,
[05:20.570 --> 05:27.090]  probably due to work from home productivity benefits, the contributions of new modules
[05:27.090 --> 05:33.530]  went sky high, which is great. We'll have open source, more code that someone else is writing,
[05:33.530 --> 05:39.250]  is less code for me to write. But the thing is, the more code that is being contributed,
[05:39.250 --> 05:49.110]  the more misconfigs that are being found in the open source repositories. So 48% of those modules
[05:49.110 --> 05:59.970]  have some kind of misconfiguration. So if we take about 2,500 modules, we'll see a lot of
[05:59.970 --> 06:07.010]  misconfigs there. And those open source modules, by GitHub statistics, had more than 26 million
[06:07.010 --> 06:14.390]  downloads. And that's a lot of downloads of misconfigured EC2 instances, S3 buckets, etc.
[06:14.850 --> 06:20.490]  So just to make things clear, writing code in Terraform does not mean that it's insecure.
[06:20.490 --> 06:26.030]  It's the other way around. Terraform gives the ability to peer review the infrastructure's code
[06:26.030 --> 06:30.570]  and to have an impact on the configuration before provisioning.
[06:31.330 --> 06:37.830]  The thing is, is that this practice of reviewing this infrastructure configuration
[06:38.650 --> 06:45.070]  is still not in place in a lot of organizations. And what we see is that backup and recovery
[06:45.070 --> 06:51.110]  is very often missed. Logging and auditing, encryption, Kubernetes configuration,
[06:51.850 --> 06:56.810]  are also missed. You can understand Kubernetes that has so many knobs and switches,
[06:56.810 --> 07:04.090]  but encryption should really be turned on by default on most of the cloud services.
[07:04.830 --> 07:13.510]  And if we break it down to the different cloud vendors, we can see that there is a different
[07:14.330 --> 07:21.390]  missing piece on all of those open source modules. For example, in Google, there is often
[07:21.390 --> 07:31.810]  misconfigured networking piece, while on AWS, Kubernetes is the most often misconfigured code.
[07:32.070 --> 07:38.430]  So where do those bad configurations come from, usually? So you can copy them from a blog post,
[07:38.430 --> 07:44.330]  from Terraform registry, from GitHub, from your internal repository where you probably have a
[07:44.330 --> 07:53.570]  platform team creating those manifests. And it's really not, usually it's not as a result of a
[07:53.570 --> 08:01.250]  bad actor, but merely lack of knowledge or lack of time of all of those best practices. You have
[08:01.250 --> 08:08.950]  over 160 services only on AWS, and the list of their configurations grows on and on.
[08:08.950 --> 08:17.070]  So it's really a lot to miss, and a lot to be familiar with as a DevOps SRE or platform engineer.
[08:17.590 --> 08:24.850]  And here comes Checkout. So Checkout was released on December 2019. It already has
[08:25.670 --> 08:33.630]  more than 1,000 GitHub stars, 600,000 downloads, tons of best opinionated best practices
[08:33.630 --> 08:41.130]  that comes within it. So more than 400 checks that are opinionated, are based on the CIS
[08:41.130 --> 08:50.090]  benchmark, SOC2, PCI, and other best practices. And it has a bunch of CICD integrations.
[08:52.510 --> 09:00.070]  So what Checkout is doing is giving you a policy as code. It gives you the ability to define a
[09:00.070 --> 09:07.750]  best practice, and to version the best practice itself. Where this policy, for example, makes
[09:07.750 --> 09:12.790]  sure everything is encrypted by default, can be peer-reviewed, can be automated as part of this
[09:12.790 --> 09:18.330]  software development lifecycle. And specifically in Checkout, we decided that policies should be
[09:18.330 --> 09:25.990]  written in a familiar language like Python. So let's take this example. We have here a
[09:25.990 --> 09:34.430]  block that defines a database. And what is missing here is that storage is not encrypted by default
[09:34.430 --> 09:40.790]  on the specific database configuration. How would a Checkout policy look like?
[09:40.790 --> 09:47.930]  Well, when creating a custom policy, you can give it a name, an identifier, the type of resource that
[09:47.930 --> 09:53.290]  you would like to inspect, which is AWS DB instance, a category, which is encryption.
[09:53.290 --> 10:02.130]  And the scan itself is just, let's look for storage encrypted and make sure that it equals true.
[10:02.350 --> 10:09.230]  If it does, let's pass it. Otherwise, let's fail the check. And Checkout is doing all of the wrapping
[10:09.230 --> 10:17.090]  of making a very clear reporting option, a very clear CI integration. And we'll take a look on
[10:17.090 --> 10:25.110]  how does that look like on CLI. So what I'm going to do now is I'm going to demonstrate Checkout
[10:25.110 --> 10:33.190]  on a vulnerable by design project called TerraGoat. And if we will have time, we'll do the same for
[10:33.190 --> 10:45.060]  Kubernetes Goat and CloudFormation Goat. So let's spin up the demo. All right. So what I have here
[10:46.020 --> 10:55.240]  is just a very simple terminal. I'm using Mac. Before this talk, I've executed brew install
[10:55.240 --> 11:03.960]  Checkout or pip install Checkout. Both would work for you. And what I have here are those three
[11:03.960 --> 11:11.520]  projects of vulnerable by design infrastructure. But even on regular open source module, you will
[11:11.520 --> 11:20.460]  likely find a misconfig. So let's dive into TerraGoat and take a look on the project structure.
[11:20.460 --> 11:28.800]  So I have a Terraform directory that has resources among the three cloud providers.
[11:28.800 --> 11:40.760]  Specifically on AWS, I have a Terraform file for S3 buckets. So over here I have a bucket
[11:40.760 --> 11:48.300]  that is public, not encrypted, without access log and without versioning. And I have another file
[11:48.300 --> 11:58.880]  for EC2 instances that has plain text access keys. But if I wouldn't tell you that,
[11:58.880 --> 12:08.460]  it would have taken you a lot of time to see that. So let's run Checkout and see what it can do.
[12:09.240 --> 12:16.640]  So the basic parameter is minus D. Choosing a directory that Checkout will scan the
[12:16.640 --> 12:23.480]  infrastructure's code for. So what I'm going to do is I'm going to scan the AWS directory
[12:25.020 --> 12:32.280]  and see how the results look like. So it should take a few seconds. And what Checkout found here
[12:32.280 --> 12:41.580]  is 58 passing checks, meaning 58 times that cloud resources were with best practices on.
[12:41.580 --> 12:50.080]  And 53 failchecks. So over here I can see all of those passing use cases. So I have here a database
[12:50.080 --> 12:56.200]  application that is not open on RDP port, which is good. I've done some good work. But if I want to
[12:56.200 --> 13:03.280]  on the bad things that it has found. So over here I have a bucket that is not encrypted at rest.
[13:03.280 --> 13:12.400]  And this is why we have a failed check. And the resource is defined between line 1 and 13. And
[13:12.400 --> 13:17.880]  I can see it here. And I can see that there is no encryption block. I've also written it to myself
[13:17.880 --> 13:26.180]  as a comment. And if I'll take a look on the next set of resources that are right after S3,
[13:26.820 --> 13:31.980]  here I have an unencrypted RDS. And that is publicly accessible.
[13:33.620 --> 13:41.560]  And also plain text access keys found in this EC2 block. Here they are.
[13:42.680 --> 13:49.280]  Why is plain text access keys bad? Because they're often used for crypto mining attack.
[13:49.280 --> 13:55.340]  People can take those access keys and deploy a crypto miner. And basically my AWS bill will go
[13:55.340 --> 14:03.020]  up and up and up. All right. So we have installed check off using pip install.
[14:03.020 --> 14:09.120]  We've executed it on our local director of AWS. And we found a bunch of issues that we should solve.
[14:10.460 --> 14:17.960]  But that was like a one time execution. How do I make sure that check off will run on each and
[14:17.960 --> 14:23.540]  every change that I'm doing to my infrastructure as code? And I want check off to tell me, hey,
[14:23.540 --> 14:31.080]  you have this specific issue. You should solve it before committing it to your GitHub account.
[14:31.300 --> 14:38.700]  And from there to the production. So what check off has is actually a pre-commit integration
[14:39.540 --> 14:46.560]  where you can do exactly that. You can configure a pre-commit hook. And on that scenario,
[14:46.560 --> 14:55.680]  before each commit, check off will scan your local directory, your local repository,
[14:55.680 --> 15:01.900]  and will prevent you from committing that bad code into your GitHub if it does not
[15:01.900 --> 15:07.620]  follow those best practices. So let's take a look on how does that look like.
[15:09.780 --> 15:19.880]  So if I go up here, and what I have configured is a pre-commit config YAML.
[15:20.240 --> 15:28.320]  Mentioning that I should run the latest version of check off always on every commit. So
[15:28.320 --> 15:38.200]  what I've done before this talk is I've created a new Terraform file called S3 new.
[15:38.540 --> 15:48.640]  And it's not committed. Let's try to commit it. So what the pre-commit hook is doing right now
[15:49.220 --> 15:58.360]  is running check off upon commit. And it fails my commit. I cannot push new code to my source
[15:58.360 --> 16:09.140]  control unless it fixes those issues that are found in my S3 new Terraform file. Which is cool.
[16:09.140 --> 16:16.580]  Now I don't need to run check off manually on each and every change. I can just install the
[16:16.580 --> 16:27.800]  pre-commit hook. And now my workstation cannot upload bad S3 buckets. All right. So we have that.
[16:27.800 --> 16:36.800]  The misconfig is gone. And I cannot use my workstation to write bad Terraform code.
[16:36.820 --> 16:41.660]  But what about workstations that did not deploy a pre-commit hook?
[16:43.500 --> 16:49.840]  The best way to handle that use case is to install check off also in your CI CD pipeline.
[16:50.480 --> 16:58.260]  Where on each change request, pull request, check off will execute as part of your CI.
[16:58.940 --> 17:03.980]  And we'll call that check off job, for example, infrastructure security tests.
[17:03.980 --> 17:09.240]  And it works just like any unit test. Where you're running application tests,
[17:09.240 --> 17:13.680]  you can do the same with those 400 best practices of check off.
[17:14.880 --> 17:24.680]  If check off approves or if the test passes successfully, you can deploy your infrastructure
[17:24.680 --> 17:30.940]  code to production. If check off fails, on the other hand, it will block your pull request from
[17:30.940 --> 17:39.420]  being merged. Just like unit tests that are blocked from being merged. So to do that specific
[17:39.420 --> 17:49.340]  piece, we have developed actually a community contributed piece is a check off action.
[17:49.560 --> 18:03.330]  Let's take a look on how it looks like. So if you go to BridgeCrew.io check off action,
[18:03.330 --> 18:10.630]  you can see a project that is contributed by this handsome fellow called Chris Mavarkis.
[18:11.350 --> 18:19.250]  And to configure a check off action in your GitHub action is as simple as that. You just
[18:20.870 --> 18:28.570]  reference this specific action on your GitHub. And you can choose which checks to run, whether
[18:28.570 --> 18:34.070]  to run them all, skip a specific one, scan a specific directory, or use any of the other
[18:34.070 --> 18:42.730]  flags that check off has. And if you actually do that and enhance that into your CI pipeline,
[18:42.730 --> 18:50.150]  assuming it's on GitHub action, it will look like this. So over here I have a pull request
[18:51.050 --> 18:56.270]  on a repository that demonstrates a GitHub action of check off.
[18:57.150 --> 19:06.230]  So here I have created a bunch of resources to create a new web server in Terraform.
[19:06.230 --> 19:13.970]  So I have this AWS instance and the GitHub action have added annotations to my code
[19:13.970 --> 19:22.270]  saying, hey, your AWS instance has hardcoded access keys and it's using unencrypted EBS.
[19:22.270 --> 19:26.350]  And I should really solve those issues here in that code block.
[19:26.430 --> 19:34.230]  And I also have issues with my security group, which has port 22 open to the entire internet.
[19:34.570 --> 19:40.750]  And I should really solve that too. So check off will do a set of annotations for each and
[19:40.750 --> 19:56.330]  every resource that is violating those best practices. So I have this pretty CI pipeline
[19:57.050 --> 20:04.290]  that will check each and every commit. Not only if I'm doing that on my endpoint,
[20:04.290 --> 20:08.810]  but also if I'm doing that directly on GitHub or from an endpoint that does not have pre-commit
[20:08.810 --> 20:22.290]  hook deployed on. So what if I want to see the same being done for Kubernetes?
[20:23.130 --> 20:29.690]  So there is actually a very cool additional project, very similar to Terracode,
[20:29.690 --> 20:38.130]  that is about Kubernetes configuration. So it is a project created by Madhu
[20:39.670 --> 20:45.310]  and it's called Kubernetes code. It has some amazing scenarios of
[20:46.270 --> 20:54.030]  bad configured Kubernetes deployment manifests. And it has also a great Katakoda work through
[20:54.030 --> 20:59.410]  that I really recommend doing just to get familiar with what are the best practices
[20:59.410 --> 21:04.470]  and bad practices when configuring a Kubernetes cluster or a job.
[21:04.490 --> 21:17.480]  What I have here on my local endpoint is actually this project already cloned.
[21:18.120 --> 21:24.680]  And what I'm going to do is I'm going to run chkov-d kubernetes-code.
[21:30.070 --> 21:36.790]  I'm going to run chkov-d on the scenarios directory. And I've actually created my
[21:36.790 --> 21:43.330]  own scenario just before this talk. And I'm going to see what the results look like.
[21:43.330 --> 21:50.550]  So chkov will now do the same that it has done for Terraform, but now for Kubernetes manifests.
[21:50.930 --> 21:59.090]  So we have here a bunch of issues on my deployment channel. I like a lot of them.
[21:59.150 --> 22:05.710]  I have here service account tokens that are mounted not when necessary.
[22:06.250 --> 22:19.230]  I have UID conflicts. I have usage of root on my containers. And I should really solve
[22:19.230 --> 22:26.210]  all of those deployments configurations across my pods. The one thing that is nice,
[22:26.210 --> 22:31.490]  if you're not familiar with that specific configuration and what's the rationale behind
[22:31.490 --> 22:39.830]  it and how to solve it, you can always go to those documentation pages that we have publicly
[22:39.830 --> 22:47.210]  made accessible explaining what's the rationale behind this specific check and how to automate
[22:47.210 --> 22:53.230]  the remediation of that specific configuration. So let's say that you're encountering a specific
[22:53.230 --> 22:59.810]  config and you don't realize why you need to solve it. Here's the explanation where you have
[22:59.810 --> 23:07.750]  the step-by-step guidelines to fix that. And obviously if you want to try to fix that
[23:07.750 --> 23:12.530]  automatically, you can try the platform, but you can also use only the open source tools.
[23:13.450 --> 23:22.150]  As for the last piece, there is actually a very cool way to use checkup for reporting mechanism,
[23:22.150 --> 23:30.630]  just like reporting over passing and failing unit tests. Checkout has a reach output format
[23:31.590 --> 23:38.450]  that has JUnit XML. And if you integrate that into a CI system like AWS CodeBuild,
[23:38.450 --> 23:44.210]  you can actually get a very nice report showing the amount of passing and failing. And in some
[23:44.210 --> 23:51.390]  systems like Jenkins, also to show the trend over time of those configurations.
[23:53.550 --> 24:01.170]  Yeah, so I guess that's more or less about it on Checkov. Just to give some more references
[24:01.170 --> 24:09.450]  of how to use it, you can always go into Checkov.io, start on the installation page
[24:09.450 --> 24:16.370]  and getting started, which is the pip install piece. And right after you have those next
[24:16.370 --> 24:26.510]  chapters that explains to you what are the existing resource scans. So Checkov has a list of
[24:28.390 --> 24:37.390]  let's see 422 best practices that are opinionated that you can execute. But if you want to ignore
[24:37.390 --> 24:45.630]  some of those and run only specific ones, you can always suppress tests that you won't like
[24:45.630 --> 24:52.090]  to see. For example, if you have a public bucket and it should be public for website assets,
[24:52.090 --> 24:58.390]  you can just use a very similar annotation to the one that you would use for JUnit
[24:59.190 --> 25:06.790]  test, for example, or PyTest, which is Checkov skip, mentions the Check ID and a description
[25:06.790 --> 25:15.530]  of why this resource should not adhere these best practices. So over here I have an S3 bucket
[25:15.530 --> 25:23.330]  that should be public because it's serving as a static hosting for web content.
[25:24.190 --> 25:28.290]  And you can do a very similar thing using a Kubernetes annotation
[25:30.210 --> 25:35.470]  and mention why you want to skip a specific check and then Checkov will not fail
[25:35.470 --> 25:43.430]  on that specific Check ID. There is some documentation also of how to integrate
[25:43.430 --> 25:53.010]  Checkov with CI systems like GitHub Actions, Jenkins, GitLab CI, and also screenshots of how
[25:53.010 --> 25:57.610]  that looks like on those specific environments.
[25:59.000 --> 26:08.170]  All right, so if you have any questions, now would be a great time. I really encourage you to
[26:08.680 --> 26:17.090]  try our open source tools, follow our blog where we update on those, and if you have any
[26:17.090 --> 26:22.990]  questions on contributions that you'd like to make, feel free to reach out over email or Twitter.
[26:23.650 --> 26:29.990]  I would be more than happy to do so. Just a quick question for me, what's next for Checkov?
[26:29.990 --> 26:35.010]  Any other integrations that are in the works or any additional new features?
[26:35.430 --> 26:44.070]  Right, so we had a bunch of requirements coming from the community. Some have asked for...
[26:44.070 --> 26:49.910]  So the latest feature is actually ARM templates. We added ARM templates to Checkov so you can
[26:49.910 --> 26:57.430]  use it not only for Terraform or CloudFormation Kubernetes, but also for this one.
[26:57.550 --> 27:04.330]  And we also got requirements from the community for Ansible, and I think that's the big
[27:05.030 --> 27:12.350]  next manifest probably. We don't have any resources scheduled yet to handle this one,
[27:12.350 --> 27:20.890]  but any contribution would be accepted. And yeah, it already has most of the CI integrations
[27:20.890 --> 27:27.010]  and Docker, a brew install, so it should be pretty straightforward to use.
[27:27.390 --> 27:33.870]  Do you think if people used TerraGo modules more, this would help the number of misconfigurations
[27:33.870 --> 27:38.730]  to go down? So TerraGrant is a do-not-repeat-yourself
[27:39.450 --> 27:45.090]  framework for Terraform. It's a templating piece that helps to reduce the amount of code
[27:45.090 --> 27:52.430]  that we're writing. So if you're writing less code, you probably create less misconfigs, but
[27:52.430 --> 28:01.650]  if the templated configuration lacks the right defaults, so it will not reduce that specific
[28:01.650 --> 28:07.870]  piece. So the answer is yes and no. Checkov specifically does not support TerraGrant.
[28:09.210 --> 28:15.570]  Completely, but it does show value in some cases. So if you have a TerraGrant project,
[28:15.570 --> 28:19.370]  I would be more than happy to see the results of Checkov on it.
[28:19.370 --> 28:26.110]  We're researching how to make it work even better with templated modules.
